Tag upstream API errors at error level with service/upstreamStatus/at…#848
Tag upstream API errors at error level with service/upstreamStatus/at…#848swdpcomputing wants to merge 4 commits intomainfrom
Conversation
d9ccd63 to
d83cbce
Compare
b3d601a to
09908b6
Compare
09908b6 to
28a12bc
Compare
|
| return apiResponse | ||
| } catch (error) { | ||
| debug(LogCodes.SYSTEM.EXTERNAL_API_ERROR, { endpoint: 'agreements', errorMessage: error.message }, request) | ||
| logAgreementsUpstreamError(request, error) |
There was a problem hiding this comment.
I'm not sure about this one 😂 clearly trying to get around the ESLint rule by encapsulating the log in another function. Let's discuss this with @alanplatt, as it might be wrong to wholly deny log in catch blocks.
There was a problem hiding this comment.
Not dodging the rule! log/logger are already on the exclude list in eslint.config.js. Pulled it into a helper because the payload got chunky. Same shape as handleVerificationError in verify-token.js. Happy to inline it, or chat with Alan.
There was a problem hiding this comment.
hehe the function logAgreementsUpstreamError is just a wrapper for log though right? Reason I mentioned Alan here is that he didn't want catch blocks to be using log, maybe that's not the right approach?



When grants-ui-backend goes down (like the failed deployment in dev that returned 502s), grants-ui in front of it returns a 500. The logs say "the BE is the cause" and are emitted at debug level and don't include the upstream HTTP status or which service had failed, so they were filtered out by default and nobody knew why grants-ui was 500ing.
This PR:
Tags every outgoing-call error with structured felds. Every call to grants-ui-backend, GAS, agreements, or consolidated-view now logs service, upstreamStatus, and (on retry-exhaustion) attempts.
Promotes those logs from debug to error level. The previous debug() wrapper forced everything down to debug regardless of declared level. This swaps to log() which respects the declared level.
So now an alert query like level=error AND service=grants-ui-backend AND upstreamStatus>=500 now matches the next time the BE fails, so we hear about it. User-facing behaviour is unchanged, the 500 page still renders. The alerting-channel investigation (second half of the ticket) is a separate config task.